# FPGA-Based 16-Channel Digital Ultrasound Receive Beamformer

#### Mawia Ahmed Hassan

**Abstract**— Ultrasound imaging is an efficient, noninvasive, method for medical diagnosis. A commonly used approach to image acquisition in ultrasound system is digital beamforming. Digital beamforming, as applied to the medical ultrasound, is defined as phase alignment and summation of signals that are generated from a common source, by received at different times by a multi-elements ultrasound transducer. In this paper implementations of FPGA-based 16- channel digital receive beamformer for ultrasound imaging was presented. The system consists of two 8 channels block to implement the 16-channel and the reconstructed line block. The beamformer was done by using Xilinx system generator (Xilinx, Inc.) and MATLAB simulink (MathWorks, Inc.). The system was implemented in Virtex-5 FPGA. The total power consumption equals 7875.66 mW and the device utilizationnwas acceptable. The hardware architecture of the design provided flexibility for beamforming.

Index Terms— Ultrasound imaging; Digital receive beamforming; FPGA; Embedded DSP; FIR Hilbert filter; VHDL.

#### **1** INTRODUCTION

Itrasound is defined as acoustic waves with frequencies above those which can be detected by the ear, from about 20 KHz to several hundred MHz. Ultrasound for medical applications typically uses only the portion of the ultrasound spectrum from 1 MHz to 50 MHz due to the combined needs of good resolution (small wave length) and good penetrating ability (not too high a frequency) [1]. They are generated by converting a radio frequency (RF) electrical signal into mechanical vibration via a piezoelectric transducer sensor [2]. The ultrasound waves propagate into the tissues of the body where apportion is reflected, which used to generate the ultrasound image. A commonly used approach to image acquisition in ultrasound system is digital beamforming. Digital beamforming, as applied to the medical ultrasound, is defined as phase alignment and summation [3] of signals that are generated from a common source, by received at different times by a multi-elements ultrasound transducer [4]. The commonly use arrays are linear, curved, or phase array. The important distinctions arise from the method of beam steering use with these arrays. For linear and curve linear, the steering is accomplished by selection of a group of elements whose location defines the phase center of the beam. In contrast to linear and curve linear array, phase array transducer required that the beamformer steers the beam with switched set of array elements [5]. These requirements mention important differences in complexity over the linear and curved array.



Beamformer has two functions: directivity to the transducer (enhancing its gain) and defines a focal point within the body, from which location of the returning echo is derived.

Different articles introduced the issues involved in digital beamformer design including the description of its main components. Embedded digital beamforming was initially done using Application-Specific Integrated Circuits (ASICs) [6]. Many approaches also described the digital signal processing algorithms that can be used in digital beamforming signal demodulation [7][8][9][10]. Real-time digital ultrasound imaging is described in [11].

In this paper, implementations of FPGA-based 16- channel digital receive beamformer for ultrasound imaging was presented.

Mawia A.Hassan- Sudan Universit of Science & technology- Biomedical Engineering Department. E-mail: <u>mawiaahmed@sustech.edu</u>.

<sup>•</sup> Mawia A. Hassan received his B.Sc. degree from the Biomedical Engineering department at Sudan University of Science & Technology in 2002. He recived his M.Sc. & Ph.D. degrees from the Biomedical Engineering department at Cairo University in 2007 and 2011 respectively. He is currently the head of Biomedical Engineering Department at Sudan university of Sceinec & Technology. His research interests include medical imaging processing, analysis in particular MRI and ultrasound imaging, and multidimensional signal processing for biomedical applications.



# **2 METHODOLOGY**

### 2.1 The proposed system

Typical architecture implementations of the modular FPGAbased 16 - channel digital ultrasound receive beamformer with embedded DSP for ultrasound imaging is shown in Fig.1. The system consists of: two 8 channels block and reconstructed line block. The beamfomer is done by using Xilinx system generator (Xilinx, Inc.) and MATLAB simulink (MathWorks, Inc.). The system is implemented in Virtex-5 FPGA.

### 2.2 The implementation steps

The inside contents of the implementation blocks of one channel were shown in Figure 2. The implementation steps are:

- 1. The RF data were saved in MATLAB workspace and we used simulink block to read the one dimension RF data from workspace.
- 2. Then The RF data were converted from double precision data type to fixed point numeric precision for hardware efficiency.
- 3. Verified the fixed-point Model by comparing the fixed-point results to the floating-point results and determined if the quantization error is acceptable.
- 4. After verified the model, we used a dual port RAM block (table 1) for the RF Samples with a depth of 1024 data words.

International Journal of Scientific & Engineering Research Volume 5, Issue 6, June-2014 ISSN 2229-5518

- 5. Then we filled this continuously with a 10 bit counter via one port.
- 6. Each time the most significant bit (MSB) of that counter toggles it indicates that one 512 word frame has been written.
- 7. The second port (for reading) used the inverse MSB, so while we were writing to the upper memory array, we were reading the lower memory array and vice versa.

| TABLE 1                  |
|--------------------------|
| THE RAM BLOCK PARAMETERS |

| Parameter                                                 | Value      |
|-----------------------------------------------------------|------------|
| Depth                                                     | 1024       |
| Initial value vector                                      | 0          |
| Memory Type                                               | Block RAM  |
| Initial value for port A output register                  | 0          |
| Initial value for port B output register                  | 0          |
| Provide synchronous reset port for port Aoutput register  | Off        |
| Provide synchronous reset port for port B output register | Off        |
| Provide enable port for port A                            | Off        |
| Provide enable port for port B                            | Off        |
| Latency                                                   | 1          |
| Port A                                                    | Read After |
|                                                           | Write      |
| Port B                                                    | Read After |
|                                                           | Write      |
| Override with doubles                                     | Off        |
| Optimize for                                              | Area       |
| Use core placement information                            | On         |
| Define FPGA area for resource estimation                  | Off        |

- 8. We used ROM block (table 2) to store the dynamic focusing line.
- 9. We used the lower 9 bits of the counter for addressing the ROM for the dynamic focusing line values.

| TABLE 2                  |
|--------------------------|
| THE ROM BLOCK PARAMETERS |

| Parameter                                | Value           |
|------------------------------------------|-----------------|
| Depth                                    | 512             |
| Initial value vector                     | Focusing Vector |
| Memory Type                              | Block RAM       |
| Provide reset port for output register   | Off             |
| Initial value for output register        | 0               |
| Provide enable port                      | Off             |
| Latency                                  | 1               |
| Word type                                | Unsigned        |
| Number of bits                           | 9               |
| Binary point                             | 0               |
| Override with doubles                    | Off             |
| Optimize for                             | Area            |
| Use pre-defined core placement           | On              |
| information                              |                 |
| Define FPGA area for resource estimation | Off             |

 We need to feed the read port address with 9 bits coming from ROM and the inverted MSB of the counter. (We used a Concat Symbol for that purpose). Thus you are always writing on one 512 value block, while simultaneous reading the other. It's like :

- Writing low 512 addresses reading high 512 addresses.
- Writing high 512 addresses reading low 512 addresses.
- The so called "bank switching" was done by the MSB of the addresses, and since they were inversed to each other you achieve the above behavior.
- 11. We put two comparator and two MUXs (table 3) behind the output of the DP-RAMs Data output for beamforming.

| TABLE 3                                  |                   |  |  |
|------------------------------------------|-------------------|--|--|
| THE MUX BLOCK PARAMETERS                 |                   |  |  |
| Parameter                                | Value             |  |  |
| Number of inputs                         | 2                 |  |  |
| Provide enable port                      | off               |  |  |
| Latency                                  | 0                 |  |  |
| Precision                                | User Defined      |  |  |
| Output type                              | Signed (2's comp) |  |  |
| Number of bits                           | 16                |  |  |
| Binary point                             | 0                 |  |  |
| Quantization                             | Truncate          |  |  |
| Overflow                                 | Wrap              |  |  |
| Override with doubles                    | off               |  |  |
| Define FPGA area for resource estimation | off               |  |  |

- 12. As the algorithm describes: One of MUXs data input should be tied to '0'.
- 13. After delaying each RF channel samples, the summation was applied using M-code block to summate the 8 channel signals.
- 14. The summation of the each two 8 channels is connected to pipe line adder and the output of each adder is connected to another adder to reconstruct one reconstructed the focus ultrasound line.
- 15. We were modified the bit of the signal to 16 bit using bit modifier block.

| TABLE 4                             |                              |  |  |  |
|-------------------------------------|------------------------------|--|--|--|
| THE HILBERT FILTER BLOCK PARAMETERS |                              |  |  |  |
| Parameter                           | Value                        |  |  |  |
| Coefficient Vector                  | Array(y1)                    |  |  |  |
| Number of Coefficient Sets          | 1                            |  |  |  |
| Filter Type                         | Single_Rate                  |  |  |  |
| Rate Change Type                    | Integer                      |  |  |  |
| Interpolation Rate Value            | 1                            |  |  |  |
| Decimation Rate Value               | 1                            |  |  |  |
| Zero Pack Factor                    | 1                            |  |  |  |
| Number of Channels                  | 1                            |  |  |  |
| Select format                       | Hardware_Oversampling_Rate   |  |  |  |
| Sample period                       | 1                            |  |  |  |
| Hardware Oversampling Rate          | 1                            |  |  |  |
| Filter Architecture                 | Systolic_Multiply_Accumulate |  |  |  |
| Coefficient Type                    | Signed                       |  |  |  |
| Quantization                        | Quantize_Only                |  |  |  |
| Coefficient Width                   | 16                           |  |  |  |
| Best Precision Fraction Length      | On                           |  |  |  |
| Coefficient Fractional Bits         | 15                           |  |  |  |
| Output Width                        | 33                           |  |  |  |
| Optimization Goal                   | Area                         |  |  |  |
| Number of samples                   | 0                            |  |  |  |

- 16. The Register block (data presented at the input will appear at the output after one sample period).
- 17. The FIR Hilbert filter block (table 4) for applying the quadrature components.
- The Fractional delay filter (in-phase filter) block to compensate the delay when we are being used a high FIR order.
- 19. Then we modified the bit of the signals from step 17 and 18 to 16 bit again using bit modifier blocks.
- 20. The Envelope detection block which was computed the envelope of the two signals coming from step 17 and 18.
- 21. In order to obtain performance and logic utilization figures for the suggestion architecture, it was implemented in the hardware description language (VHDL).

# **3 EXPERIMENTAL VERIFICATION**

#### 3.1 The ultrasound data

The system was used to acquire data from a resolution phantom. The data acquired from a resolution phantom. This data was collected in IBE Tech Giza, Egypt. The sampling rate was 50 MHz and the number of channels used acquired was 32. The scan depth was 6 cm, the number of channels was 32 channels, and the ADC sampling rate was 50 MSPS. Curve linear array shape transducer was used to acquire the data with central frequency of 3.5 MHz, and element spacing of 0.516 mm. Each ultrasonic A-scan was saved in a record consisted of 4096 RF samples per line each represented in 2 bytes. The speed of the ultrasound in the phantom was 1540 m/sec.

### 3.2 Delay

Fig. 3 (a) illustrated one channel after applying the dynamic focusing and before correcting the DC shift for simulated and



implemented signal. As can be seen there is a different in some parts of the implementation signal compared to simulation one. The reason for that was the noise for high power in the beginning of the signal. Figure 3(b) shown the comparison between implemented and simulated focused signal after remove the noise and also apply TGC to compensate for attenuation in the medium. As can be shown the signals look the same.



Fig. 4. Frequency spectrums for ultrasound line. (a) Simulated real signal, (b) Implemented real signal, (c) Simulated analytical signal, (d) Implemented analytical signal.

http://www.ijser.org

International Journal of Scientific & Engineering Research Volume 5, Issue 6, June-2014 ISSN 2229-5518

#### 3.3 The reconstructed line

24-tap FIR Hilbert was used for the simulation and implementation of the ultrasound data. Fig. 4(a) and Fig. 4(b) showed the frequency spectrum of simulated and implemented single channel real signal respectively. Fig. 4(c) and Fig. 4(d) were showed the frequency spectrum of simulated and implemented single channel analytical signal respectively. As can be shown the negative frequency was eliminated compared to the simulated Hilbert filter frequency spectrum.

## 3.4 Envelope Detection

Fig. 5 was described the comparison between simulated and implemented envelope detection of real and quadrature components. As can be shown the result was acceptable.



### 3.5 Power Consumption

Table 5 was shown the summary of the power consumption in 16-channel beamforming implementation on virtex-5. The total estimated power consumption equal 4130.48 mW. Table 6 was shown the summary of the power consumption in the reconstructed line implementation on virtex-5. The total estimated power consumption equal 3745.18 mW.

### 3.6 Device Utilization

Table 7 and 8 are shown the device utilization summary for the 16-channels beamforming block implementation and the reconstructed line implementation respectively. The tables described used devices, available in the port, and the utilization in percentage using Virtex-5 FPGA.

# **4** CONCLUTION

The implementation of the system was done in Virtex-5 FPGA. We used three port of Virtix-5, two 8-channel beamforming for the 16-channel beamforming and the other for the reconstructed line. This take an opportunity to build 16-,32-,64-,digital beamforming. The implementation results were shown that the fixed-point Model is the same as the floating point mode and this is an important for hardware efficiency.

| TABLE 5                               |
|---------------------------------------|
| POWER CONSUMPTION IN 16-CHANNEL BEAM- |
| FORMING                               |

| FORMING                                |         |         |  |  |
|----------------------------------------|---------|---------|--|--|
| Power summary                          | I(mA)   | P(mW)   |  |  |
| Total Vccint 1.00V                     | 2943.72 | 2943.72 |  |  |
| Total Vccaux 2.50V                     | 351.10  | 877.76  |  |  |
| Total Vcco25 2.50V                     | 123.74  | 309.36  |  |  |
| BRAM                                   | -       | 125.66  |  |  |
| Clocks                                 | -       | 53.09   |  |  |
| DSP                                    | -       | 0.00    |  |  |
| 10                                     | -       | 300.31  |  |  |
| Logic                                  | -       | 0.73    |  |  |
| Signals                                | -       | 13.46   |  |  |
| Quiescent Vccint 1.00V                 | 2745.09 | 2745.09 |  |  |
| Quiescent Vccaux 2.50V                 | 345.00  | 862.50  |  |  |
| Quiescent Vcco25 2.50V                 | 12.00   | 30.00   |  |  |
| Total estimated power con-<br>sumption | -       | 4130.48 |  |  |

| TABLE 0   Power Consumption in the reconstructed line |         |         |  |
|-------------------------------------------------------|---------|---------|--|
| Power summary                                         | I(mA)   | P(mW)   |  |
| Total Vccint 1.00V                                    | 2759.26 | 2759.26 |  |
| Total Vccaux 2.50V                                    | 346.94  | 867.34  |  |
| Total Vcco25 2.50V                                    | 47.43   | 118.58  |  |
| Clocks                                                | -       | 30.58   |  |
| DSP                                                   | -       | 1.58    |  |
| 10                                                    | -       | 95.44   |  |
| Logic                                                 | -       | 7.92    |  |
| Signals                                               | -       | 6.11    |  |
| Quiescent Vccint 1.00V                                | 2711.04 | 2711.04 |  |
| Quiescent Vccaux 2.50V                                | 345.00  | 862.50  |  |
| Quiescent Vcco25 2.50V                                | 12.00   | 30.00   |  |
| Total estimated power con-<br>sumption                | -       | 3745.18 |  |

Future, the delays applied using dynamic focusing gave a synchronous in the time of arrival and improved the lateral resolution. Furthermore, The Hilbert filter is implemented in the form whereby the zero tap coefficients are not computed and therefore an order L filter uses only L/2 multiplications. This was reducing the computational time by a half. The total estimated power consumption for the 16-channel beamforming ports equal to 4130.48 mW and the device utilization was acceptable. Also the total estimated power consumption for the reconstructed line ports equal to 7875.66 mW and the device utilization was also acceptable.

| Slice Logic Utilization                            | Used | Available | Utilization | De<br>Slice    |
|----------------------------------------------------|------|-----------|-------------|----------------|
| Number of Slice Registers                          | 276  | 207,360   | 1%          | Nun            |
| Number used as Flip Flops                          | 276  | -         | -           | Nun            |
| Number of Slice LUTs                               | 964  | 207,360   | 1%          | Nun            |
| Number used as logic                               | 946  | 207,360   | 1%          | Nun            |
| Number using O6 output                             | 594  | -         | _           | Nun            |
| only<br>Number using O5 output                     | 237  | _         | -           | Nun            |
| only                                               | 115  |           |             | Nun            |
| Number using O5 and O6<br>Number used as exclusive |      | -         | -           | Nun            |
| route-thru                                         | 18   | -         | -           | Nun            |
| Number using O6 output<br>only                     | 255  | -         | -           | Num            |
| Number of route-thrus                              | 261  | -         | -           | Num            |
| Number using O5 output<br>only                     | 6    | -         | -           | rout<br>Num    |
| Number of occupied Slices                          | 424  | 51,840    | 1%          | Num            |
| Number of LUT Flip Flop                            | 995  |           |             | Num            |
| pairs used<br>Number with an unused Flip<br>Flop   | 719  | 995       | 72%         | Num            |
| Number with an unused LUT                          | 31   | 995       | 3%          | Num            |
| Number of fully used LUT-                          | 245  | 995       | 24%         | used           |
| FF pairs<br>Number of unique control<br>sets       | 1    | -         | -           | Num<br>Num     |
| Number of slice register sites                     | _    |           |             | Num            |
| lost<br>to control set restrictions                | 0    | 207,360   | 0%          | pairs          |
| Number of bonded IOBs                              | 863  | 1,200     | 71%         | Num            |
| Number of BlockRAM/FIFO                            | 32   | -         | -           | Num            |
| Number using BlockRAM                              | 35   | -         | -           | lost to<br>Num |
| only<br>Number of 18k BlockRAM<br>used             | 48   | -         | -           | Num            |
| Total Memory used (KB)                             | 864  | 10,368    | 8%          | Num            |
| Number of<br>BUFG/BUFGCTRLs                        | 1    | 32        | 3%          | Num            |
| Number used as BUFGs                               | 1    | -         | -           | -Ave           |
| Number of DSP48Es                                  | 16   | 192       | 8%          | Nets           |
| Average Fanout of Non-                             | 10   | 172       | 070         | Refe           |

| Table 8   Device utilization summary in the reconstructed line  |      |           |             |  |
|-----------------------------------------------------------------|------|-----------|-------------|--|
| Slice Logic Utilization                                         | Used | Available | Utilization |  |
| Number of Slice Registers                                       | 678  | 207,360   | 1%          |  |
| Number used as Flip Flops                                       | 678  |           |             |  |
| Number of Slice LUTs                                            | 567  | 207,360   | 1%          |  |
| Number used as logic                                            | 163  | 207,360   | 1%          |  |
| Number using O6 output only                                     | 147  | -         | -           |  |
| Number using O5 output only                                     | 1    | -         | -           |  |
| Number using O5 and O6                                          | 15   | -         | -           |  |
| Number used as Memory                                           | 403  | 54,720    | 1%          |  |
| Number used as Shift Register                                   | 403  | -         | -           |  |
| Number using O6 output only                                     | 403  | -         | -           |  |
| Number used as exclusive route-thru                             | 1    | -         | -           |  |
| Number of route-thrus                                           | 3    | -         | -           |  |
| Number using O6 output only                                     | 2    | -         | -           |  |
| Number using O5 output only                                     | 1    | -         | -           |  |
| Number of occupied Slices                                       | 249  | 51,840    | 1%          |  |
| Number of LUT Flip Flop pairs used                              | 728  | -         | -           |  |
| Number with an unused Flip Flop                                 | 50   | 728       | 6%          |  |
| Number with an unused LUT                                       | 161  | 728       | 22%         |  |
| Number of fully used LUT-FF pairs                               | 517  | 728       | 71%         |  |
| Number of unique control sets                                   | 26   | -         | -           |  |
| Number of slice register sites lost to control set restrictions | 3    | 207,360   | 1%          |  |
| Number of bonded IOBs                                           | 97   | 1,200     | 8%          |  |
| Number of BUFG/BUFGCTRLs                                        | 1    | 32        | 3%          |  |
| Number used as BUFGs                                            | 1    | -         | -           |  |
| Number of DSP48Es                                               | 3    | 192       | 1%          |  |
| -Average Fanout of Non-Clock<br>Nets                            | 1.81 | -         | -           |  |
|                                                                 |      |           |             |  |

#### REFERENCES

- [1] D. A. Christensen, *Ultrasonic Bioinstrumentation*, Jonh Wiley & Sons, New York, 1988.
- [2] J. A Zagzebski , *Essentials of ultrasound physics*, St Louis, Mo: Mosby, 1996.
- [3] R.A. Mucci, "A Comparison of Efficient Beamforming Algorithms," *IEEE Trans. Acoustics, Speech, And Signal Proc.*, vol. 32. pp. 548-558, 1984.

490

- [4] R. Reeder, C. Petersen, "The AD9271-A Revolutionary Solution for Portable Ultrasound," *Analog Dialogue* 41-07, Analog Devices, July 2007.
- [5] K. E. Thomenius, "Evaluation of Ultrasound Beamformers," in Proc. IEEE Ultrason. Symp., 1996, pp.1615-1621.
- [6] B.D. Steinberg, "Digital beamforming in ultrasound," *IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control*, vol. 39, no. 6, 1992, pp.716-721.
- [7] C. Fritsch, M. Parrilla, T. Sanchez, O. Martinez, "Beamforming with a reduced sampling rate," *Ultrasonics*, vol. 40, 2002, pp. 599–604.
- [8] S. R. Freeman, M. K. Quick, M. A. Morin, R. C. Anderson, C. S. Desilets, T. E. Linnenbrink, and M. O'Donnell, "Delta sigma oversampled ultrasound beamformer with dynamic delays," *IEEE Trans. Ultrason., Ferroelect., Freq. Contr.*, vol. 46, 1999, pp. 320–332.
- [9] M. Kozak and M. Karaman, "Digital phased array beamforming using single-bit delta-sigma conversion with non-uniform oversampling," *IEEE Trans. Ultrason., Ferroelect., Freq. Contr.*, vol. 48, 2001, pp. 922–931.
- [10] Mawia A. Hassan and Yasser M. Kadah, "Digital Signal Processing Methodologies for Conventional Digital Medical Ultrasound Imaging System," Proc. American Journal of BiomedicalEngineering, vol. 3(1), pp. 14-30, USA, 2013
- [11] C. Basoglu, R. Managuli, G. York, and Y. Kim, "Computing requirements of modern medical diagnostic ultrasound machines," *Parallel Computing*, vol. 24, 1998, pp. 1407-1431.

# IJSER